-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make active_work match records_per_batch #1316
Make active_work match records_per_batch #1316
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, but it seems some tests are broken.
} | ||
|
||
impl<'a> DZKPUpgraded<'a> { | ||
pub(super) fn new( | ||
validator_inner: &Arc<MaliciousDZKPValidatorInner<'a>>, | ||
base_ctx: MaliciousContext<'a>, | ||
) -> Self { | ||
// Adjust active_work to be at least records_per_batch. If it is less, we will |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should probably log the fact that we've adjusted it - maybe at the debug level
|
||
#[tokio::test] | ||
async fn select_semi_honest() { | ||
test_select_semi_honest::<BA8>().await; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it worth testing it for for weird types like BA3 and BA7 as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think so. I had BA4 but took it out because it's not boolean_vector!
(not for any good reason other than we haven't needed it). But I think it is worth having a less-than-one-byte case, and maybe even adding a new BA type so we can cover the between-one-and-two-bytes case.
(1usize<<log_count, 1usize<<log_multiplication_amount) | ||
} | ||
fn record_count_strategy() -> impl Strategy<Value = usize> { | ||
prop_oneof![1usize..=512, (1usize..=9).prop_map(|i| 1usize << i)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am being stupid and it is 12pm now - is it really testing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What it does is:
- 50% of the time it picks a random power of two between 2 and 512 (9 possible values)
- 50% of the time it picks a random integer between 1 and 512 (512 possible values)
Two reasons to focus on powers of two, although I didn't put a huge amount of thought into this (e.g. it just occurred to me to add
- Because we plan to restrict batch sizes to powers of two.
- As the count increases, it makes sense to me to sample the space more sparsely, there's relatively less interesting about testing batch sizes 399, 400, and 401 than about testing batch sizes 7, 8, and 9.
I think it is clearer to have the two prop_oneof
cases on different lines, but rustfmt insisted on a single line.
// Adjust active_work to be at least records_per_batch. The MAC validator | ||
// currently configures the batcher with records_per_batch = active_work, which | ||
// makes this adjustment a no-op, but we do it this way to match the DZKP validator. | ||
let records_per_batch = batch.lock().unwrap().records_per_batch(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would it be better to assert this to make sure we don't miss the misalignment between MAC and ZKP?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I'm fine with that.
@@ -217,7 +217,7 @@ impl<'a, F: ExtendableField> BatchValidator<'a, F> { | |||
|
|||
// TODO: Right now we set the batch work to be equal to active_work, | |||
// but it does not need to be. We can make this configurable if needed. | |||
let records_per_batch = ctx.active_work().get().min(total_records.get()); | |||
let records_per_batch = ctx.active_work().get(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reducing if larger than total_records
may have been necessary with an earlier version of the batcher, but the current version should take care of that internally, so I removed it here. No relation to the rest of these changes though.
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1316 +/- ##
==========================================
+ Coverage 93.45% 93.52% +0.07%
==========================================
Files 207 208 +1
Lines 33451 33802 +351
==========================================
+ Hits 31260 31613 +353
+ Misses 2191 2189 -2 ☔ View full report in Codecov by Sentry. |
for<'a> Replicated<V>: BooleanArrayMul<DZKPUpgradedSemiHonestContext<'a, NotSharded>>, | ||
Standard: Distribution<V>, | ||
{ | ||
let world = TestWorld::default(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just realized that we probably want to use the same seed for rng and test world (it is supported via TestWorldConfig
struct). That way we can make it reproducible if it ever fails
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made an issue for this. It's definitely worth doing in general, but it doesn't seem all that important for this particular test case, where the input values should be unrelated to the behavior of the test.
The newest commit is to fix the failure in https://github.com/andyleiserson/ipa/actions/runs/11061292252/job/30733595263. |
Draft runs: 100k: https://draft-mpc.vercel.app/query/view/agile-book2024-09-27T1606, successful I suspect the issue is that, without the changes to make batch sizes power of two / align read size with batch sizes, we can't remove the change that uses active work as the batch size for attribution. |
2333492
to
d2512f1
Compare
Draft result for 2M records at d2512f1: https://draft-mpc.vercel.app/query/view/drunk-widow2024-09-27T1647 It failed due to #1308. But when I added the workaround (that was 2333492), CI failed due to excessive memory allocation. The 2M draft run at 2333492 passed: https://draft-mpc.vercel.app/query/view/iron-input2024-09-27T1735. |
|
Here's the failure with large batch size for breakdown reveal aggregation: https://github.com/private-attribution/ipa/actions/runs/11075152193/job/30775478393 |
Adjust active_work to match records_per_batch. If it is less, we will stall, since every record in the batch remains incomplete until the batch is validated. More might be okay, but making it the same seems simpler for now.
This change also modifies an existing DZKP proptest to be more extensive. I believe the test is capable of exposing both the hang when active_work is less than batch size (when run without the fix in this PR), and the hang due to read_size not dividing active work (not yet fixed).
I moved a couple tests that are doing DZKPs with various BA types from
secret_sharing::vector::impls
to the DZKP module. My intent was to adapt them to cover the interaction with record size, but I haven't done that yet. (The move is in a separate commit, although in the current state I don't think there's interesting diff on top of the move anyways.)